0%

(CVPR 2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling

Bai S, Kolter J Z, Koltun V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling[J]. arXiv preprint arXiv:1803.01271, 2018.



1. Overview


In this paper, it demonstrated that simple CNN (TCN) outperforms RNN, such as LSTM

  • TCN. longer memory, more accurate, simper
  • RNN. not paralleled

1.1. Background

1.1.1. Application

  • part-of-speech tagging and semantic role labelling
  • sentence classification
  • document classification
  • machine translation
  • audio synthesis
  • language modeling

1.1.2. Architecture

  • LSTM
  • GRU
  • ConvLSTM
  • Quasi-RNN
  • dilated RNN



2. TCN




2.1. Architecture

  • input. sequence of any length
  • output sequence of the same length
  • no gating mechanism
  • longer memory
  • 1D Conv with padding
  • receptive fileds
  • dilation factor d
  • kernel size k
  • network depth n
  • d=2^n to make sure to hit each input within the effective history

2.2. Advantage

  • parallelism
  • flexible receptive filed size
  • stable gradient
  • low memory requirement for training
  • variable length inputs

2.3. Disadvantage

  • data storage during evaluation
  • potential parameter change for a transfer of domain



3. Experiments


3.1. Details

  • gradient clipping helped convergence [0.3, 1]
  • find that TCN insensitive to hyperparameter changes, as long as the effective history size is sufficient

3.2. Comparison



3.3. Adding Problem



3.4. Controlled Experiments